Search CORE

93 research outputs found

PCA and K-Means decipher genome

Author: A Zinovyev
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
FHC Crick
HY Ou
J Jackson
R Staden
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they ``discover'' that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. Appendix 1 contains program listings that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich ones. Animated 3D PCA plots are attached as separate gif files. Topology (cluster structure) and geometry (mutual positions of clusters) of these plots depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes and additional animated 3D PCA plot

arXiv.org e-Print Archive

CiteSeerX

Crossref

PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes

Author: A Gorban
A Gusev
A Zinovyev
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
B Kégl
CM Bishop
E Erwin
F Mulier
FHC Crick
H Ritter
J Einbeck
K Pearson
M Löwe
M Nagl
R Shyamsundar
S Matveev
T Hastie
T Kohonen
TM Martinetz
VA Dergachev
YF Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2007
Field of study

Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object'': a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node'', ``bisect an edge'') is equivalent to the construction of ``principal trees'', an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map'' representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

Author: A Gorban
A Gorban
A Gusev
A Zinovyev
A. N. Gorban
AJ Smola
AJ Smola
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
AY Zinovyev
B Kégl
B Kégl
B Mirkin
B Schölkopf
CM Bishop
CM Perou
D Stanford
DG Kendall
E Erwin
F Mulier
H Ritter
H Yin
H Yin
H Zou
JB Tenenbaum
JD Banfield
K Pearson
Kégl
L Aizenberg
L Dyrskjot
M Born
M Frećhet
M LeBlanc
M Oja
R Durbin
R Sayle
R Shyamsundar
S Kaski
S Roweis
T Hastie
T Hastie
T Kohonen
VA Dergachev
W Cai
Y Wang
YF Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2007
Field of study

Principal manifolds are defined as lines or surfaces passing through ``the middle'' of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects'' of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.Comment: 35 pages 10 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Astrocytes organize associative memory

Author: Gorban AN
Gordleeva SY
Ivanchenko MV
Krivonosov MI
Lotareva YA
Zaikin AA
Publication venue: Springer Nature
Publication date: 04/09/2019
Field of study

We investigate one aspect of the functional role played by astrocytes in neuron-astrocyte networks present in the mammal brain. To highlight the effect of neuron-astrocyte interaction, we consider simplified networks with bidirectional neuron-astrocyte communication and without any connections between neurons. We show that the fact, that astrocyte covers several neurons and a different time scale of calcium events in astrocyte, alone can lead to the appearance of neural associative memory. Without any doubt, this mechanism makes the neuron networks more flexible to learning, and, hence, may contribute to the explanation, why astrocytes have been evolutionary needed for the development of the mammal brain

UCL Discovery

Modeling Working Memory in a Spiking Neuron Network Accompanied by Astrocytes

Author: Gorban AN
Gordleeva SY
Ivanchenko MV
Kazantsev VB
Krivonosov MI
Tsybina YA
Zaikin AA
Publication venue
Publication date: 01/01/2021
Field of study

We propose a novel biologically plausible computational model of working memory (WM) implemented by a spiking neuron network (SNN) interacting with a network of astrocytes. The SNN is modeled by synaptically coupled Izhikevich neurons with a non-specific architecture connection topology. Astrocytes generating calcium signals are connected by local gap junction diffusive couplings and interact with neurons via chemicals diffused in the extracellular space. Calcium elevations occur in response to the increased concentration of the neurotransmitter released by spiking neurons when a group of them fire coherently. In turn, gliotransmitters are released by activated astrocytes modulating the strength of the synaptic connections in the corresponding neuronal group. Input information is encoded as two-dimensional patterns of short applied current pulses stimulating neurons. The output is taken from frequencies of transient discharges of corresponding neurons. We show how a set of information patterns with quite significant overlapping areas can be uploaded into the neuron-astrocyte network and stored for several seconds. Information retrieval is organized by the application of a cue pattern representing one from the memory set distorted by noise. We found that successful retrieval with the level of the correlation between the recalled pattern and ideal pattern exceeding 90% is possible for the multi-item WM task. Having analyzed the dynamical mechanism of WM formation, we discovered that astrocytes operating at a time scale of a dozen of seconds can successfully store traces of neuronal activations corresponding to information patterns. In the retrieval stage, the astrocytic network selectively modulates synaptic connections in the SNN leading to successful recall. Information and dynamical characteristics of the proposed WM model agrees with classical concepts and other WM models

UCL Discovery

Robust simplifications of multiscale biochemical networks

Abstract Background Cellular processes such as metabolism, decision making in development and differentiation, signalling, etc., can be modeled as large networks of biochemical reactions. In order to understand the functioning of these systems, there is a strong need for general model reduction techniques allowing to simplify models without loosing their main properties. In systems biology we also need to compare models or to couple them as parts of larger models. In these situations reduction to a common level of complexity is needed. Results We propose a systematic treatment of model reduction of multiscale biochemical networks. First, we consider linear kinetic models, which appear as "pseudo-monomolecular" subsystems of multiscale nonlinear reaction networks. For such linear models, we propose a reduction algorithm which is based on a generalized theory of the limiting step that we have developed in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Second, for non-linear systems we develop an algorithm based on dominant solutions of quasi-stationarity equations. For oscillating systems, quasi-stationarity and averaging are combined to eliminate time scales much faster and much slower than the period of the oscillations. In all cases, we obtain robust simplifications and also identify the critical parameters of the model. The methods are demonstrated for simple examples and for a more complex model of NF-<it>κ</it>B pathway. Conclusion Our approach allows critical parameter identification and produces hierarchies of models. Hierarchical modeling is important in "middle-out" approaches when there is need to zoom in and out several levels of complexity. Critical parameter identification is an important issue in systems biology with potential applications to biological control and therapeutics. Our approach also deals naturally with the presence of multiple time scales, which is a general property of systems biology models.</p

HAL-CentraleSupelec

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

HAL-Rennes 1

Leicester Research Archive

Improving Randomized Learning of Feedforward Neural Networks by Appropriate Generation of Random Parameters

Author: AN Gorban
B Igelnik
C Weipeng
D Husmeier
D Wang
G Dudek
J Principe
L Zhang
L Zhang
M Li
S Scardapane
Y-H Pao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/08/2019
Field of study

In this work, a method of random parameters generation for randomized learning of a single-hidden-layer feedforward neural network is proposed. The method firstly, randomly selects the slope angles of the hidden neurons activation functions from an interval adjusted to the target function, then randomly rotates the activation functions, and finally distributes them across the input space. For complex target functions the proposed method gives better results than the approach commonly used in practice, where the random parameters are selected from the fixed interval. This is because it introduces the steepest fragments of the activation functions into the input hypercube, avoiding their saturation fragments

arXiv.org e-Print Archive

Crossref

A Linear Algebra Approach for Detecting Binomiality of Steady State Ideals of Reversible Chemical Reaction Networks

Author: A Dickenstein
A Einstein
A Sadeghimanesh
AN Gorban
AN Gorban
B Buchberger
B Sturmfels
C Conradi
D Eisenbud
D Grigoriev
D Grigoriev
DY Grigoriev
EW Mayr
F Boulier
F Horn
G Craciun
JC Faugère
JH Davenport
L Boltzmann
L Onsager
M Feinberg
M Feinberg
M Pérez Millán
M Pérez Millán
MP Millán
R Wegscheider
V Weispfenning
W Fulton
Publication venue
Publication date: 01/01/2020
Field of study

Motivated by problems from Chemical Reaction Network Theory, we investigate whether steady state ideals of reversible reaction networks are generated by binomials. We take an algebraic approach considering, besides concentrations of species, also rate constants as indeterminates. This leads us to the concept of unconditional binomiality, meaning binomiality for all values of the rate constants. This concept is different from conditional binomiality that applies when rate constant values or relations among rate constants are given. We start by representing the generators of a steady state ideal as sums of binomials, which yields a corresponding coefficient matrix. On these grounds we propose an efficient algorithm for detecting unconditional binomiality. That algorithm uses exclusively elementary column and row operations on the coefficient matrix. We prove asymptotic worst case upper bounds on the time complexity of our algorithm. Furthermore, we experimentally compare its performance with other existing methods

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

MPG.PuRe

Validation of nonlinear PCA

Author: A Herman
A Ilin
AN Gorban
B Chalmond
B Christiansen
B Efron
B Schölkopf
BW Lu
D DeMers
JB Tenenbaum
LK Saul
M Scholz
MA Kramer
Matthias Scholz
MR Hestenes
ND Lawrence
P Demartines
R Hecht-Nielsen
S Girard
S Harmeling
S Mika
ST Roweis
T Hastie
T Kohonen
WW Hsieh
WW Hsieh
WW Hsieh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Fondazione Edmund Mach

Deriving effective models for multiscale systems via evolutionary $Gamma$ -convergence

Author: A Mielke
A Mielke
A Mielke
A Mielke
AN Gorban
B Fiedler
D. Schüler
H Attouch
H Brézis
L Ambrosio
L Onsager
M Feinberg
M Grmela
MA Peletier
Mark A. Peletier
MH Duong
R Rossi
S Arnrich
S Serfaty
U Stefanelli
W Fenchel
Publication venue
Publication date: 01/01/2015
Field of study

We discuss possible extensions of the recently established theory of evolutionary Gamma convergence for gradient systems to nonlinear dynamical systems obtained by perturbation of a gradient systems. Thus, it is possible to derive effective equations for pattern forming systems with multiple scales. Our applications include homogenization of reaction-diffusion systems, the justification of amplitude equations for Turing instabilities, and the limit from pure diffusion to reaction-diffusion. This is achieved by generalizing the Gamma-limit approaches based on the energy-dissipation principle or the evolutionary variational estimate

Crossref

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Repositorium für Naturwissenschaften und Technik